Removing Noisy Mentions for Distant Supervision Eliminando Menciones Ruidosas para la Supervisión a Distancia

نویسندگان

  • Ander Intxaurrondo
  • Mihai Surdeanu
  • Oier Lopez de Lacalle
  • Eneko Agirre
چکیده

Relation Extraction methods based on Distant Supervision rely on true tuples to retrieve noisy mentions, which are then used to train traditional supervised relation extraction methods. In this paper we analyze the sources of noise in the mentions, and explore simple methods to filter out noisy mentions. The results show that a combination of mention frequency cut-off, Pointwise Mutual Information and removal of mentions which are far from the feature centroids of relation labels is able to significantly improve the results of two relation extraction models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Distant Supervision for Information Extraction Using Label Propagation Through Lists

Because of polysemy, distant labeling for information extraction leads to noisy training data. We describe a procedure for reducing this noise by using label propagation on a graph in which the nodes are entity mentions, and mentions are coupled when they occur in coordinate list structures. We show that this labeling approach leads to good performance even when off-the-shelf classifiers are us...

متن کامل

Modeling Relations and Their Mentions without Labeled Text

Several recent works on relation extraction have been applying the distant supervision paradigm: instead of relying on annotated text to learn how to predict relations, they employ existing knowledge bases (KBs) as source of supervision. Crucially, these approaches are trained based on the assumption that each sentence which mentions the two related entities is an expression of the given relati...

متن کامل

CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases

Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. ...

متن کامل

AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding

Distant supervision has been widely used in current systems of fine-grained entity typing to automatically assign categories (entity types) to entity mentions. However, the types so obtained from knowledge bases are often incorrect for the entity mention’s local context. This paper proposes a novel embedding method to separately model “clean” and “noisy” mentions, and incorporates the given typ...

متن کامل

Metodología DoRCU para la Ingeniería de Requerimientos

Resumen. DoRCU, Documentación de Requerimientos Centrada en el Usuario, es una metodología para la Ingeniería de Requerimientos caracterizada por su flexibilidad y orientación al usuario. Considera los mejores resultados de los enfoques examinados y se apoya en diversos métodos, técnicas y herramientas ya desarrollados por otros autores, pero sin comprometerse con los lineamientos de un paradig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013